OperatorSDK enhancement with best practices and OKT proposition #84

tapairmax · 2021-06-11T09:02:41Z

No description provided.

tapairmax · 2021-06-11T09:05:08Z

To follow up our (@orange) exchange with Tony Wu and Eric Stroczynski.
Patrick Royer

estroz · 2021-06-24T18:24:43Z

enhancements/orange_okt_proposition.md

+-    In the Controller folder I use the OKT code generator for my 3 resources I want to create. i.e.: 
+   `okt-gen-resource xxxxx`
+- In each file generated, I always find 3 parts to fill by myself and that will constitute a common/standard way to Mutate my resource:
+	- A method called `getHashXXX() { // Put your code here  }`  - It allows me to define, thanks to an OKT Helper, wich part of my object I want to include in the hash computation


How is the hash's input created? Marhsaling the object itself? I've seen quite a few cases where a time or resourceVersion value is included in a hash/comparison, with an always-true result, so exposing such a function may be a source of confusion and bugs.

Firstly for each resource, you have to define in the "getHashXXX()" method, the scope of objects/elements covered by the hash computation. OKT provides you an helper to define this scope. Internally it use an array of pointers with all the elements participating to the scope (the metadata you want, the whole Object's Spec part if you want, actually whatever you want).
You return this "bag" of collected elements through the return statement of "getHashXXX()' method.

To perform the Hash computation, OKT use the spew library (github.com/davecgh/go-spew/spew) which follows pointers and prints actual values of the nested objects ensuring the hash does not change when a pointer changes.
The returned hash can be used for object comparisons.

So if the developper don't add resourceVersion in the scope, there's no risk to meet the bug you mentionned. I know that some immutable fields (IP in Services, ...) and the Secret's data are things to take care about (OKT helper help on this point).
There is an example in the operator Memcached with OKT I sent to you and Tony (it can be shared).

So it sounds like the hash is defined by the developer and they can choose as much or as little in their hash. And I agree with Eric that if someone unknowingly adds an attribute that changes it will cause an infinite reconcilation loop. (we've done this ourselves :)

if someone unknowingly adds an attribute that change it will cause an infinite reconcilation loop

OKT comes with an integrated throttling mechanism in case of a same recurrent errors, but not sure it can help in such case...
Besides that, when the developer defines the hash in the "getHashXXX()" method, he uses an OKT helper generic or dedicated for the resource kind he is dealing with. For example, the SecretMutationHelper object manages few things with StringData and Data fields. It ensures too that the hash computation covers only the Data field and not the StringData field.
Here again, I don't know if there's a more elegant solutions and if a such helper is sufficient to guarantee the problem of the changing attributes added unknowingly.
The counterpart of these helpers, implemented by OKT, is the need to maintain them. Today, the only specific one that exists (the others are generic helpers), is the SecretMutationHelper. I thought to need another one specific for the Service resource and its changing ClusterIP, but in practice all works correctly without special care, it seems it was a mistake to think that.

Edited the last sentence regarding specifics vs generics helpers.

estroz · 2021-06-24T18:30:00Z

enhancements/orange_okt_proposition.md

+1. I create a project with the Operator SDK command as usual
+2. In my Controller, I choose an OKT Reconciler and add the Reconciliation steps function in which I can see distinctly the standard steps in which I should pass at each reconciliation event (each step below is an entry in an embracing `switch(step) { case "xxx": ... }`:
+	- **CR Checker** - once the CR is picked up by OKT and validated by webhooks (if any), I can tell here if I have something to do more on my CR
+	- **ObjectGetter** - Here I will add the code to create my 3, in memory, resources and add them to the OKT's registry of resources (still in memory). However at this stage OKT pick up the resource on the K8S Cluster so it knows if the resource in its registry has an existing peer on the Cluster (or cache).


Assuming you mean that the operator registers 3 types with OKT, this sounds a lot like controller-runtime's cache.

I must admit don't know lot of things about the controller-runtime's cache implementation.

At OKT level it is an array of a generic resource type, that OKT uses at the "Mutator" and "Updater" reconciliation steps to loop on each resource and call respectively the "getHashXXX(), MutateWithInitialData(), MutateWithCR()" (for mutation) and a specific "CreateOrUpdate()" OKT method.

estroz · 2021-06-24T18:37:25Z

enhancements/orange_okt_proposition.md

+
+Today, the Story 1 is fully operational and implementable with the current version of OKT. A partial implementation detail can be provided by the MemcachedOperatorSampleWithOKT repo code.
+
+The architecture to render story 2 implementable is not yet fully completed and we are wondering if this approach make sense or not.


I think the concept of a controller state machine makes a lot of sense, and one generic enough to register an arbitrary application lifecycle for an operator to execute would be powerful. I think such a library warrants an enhancement on its own, with a prototype ideally.

Yes thank you, the concept is interresting and I have some ideas to render the LC graph implementation generic at some points. Note that the K8S resources reconciliation steps (CRChecker(), ObjectsGetter(), ..., SuccessManager() ) are already implemented thanks to a StateMachine library (Qor transition), it is a bit overrated because initialy I thought about a more complex reconciliation process, but it's done with that right now.
It is clear now, that it is better to separate in 2 distincts processes the K8S resources reconciliation and the Application's LC graph stages.
Besides that, some ideas comes in mind. We could manage the state changes through a standard events formalizations as the one brought by the CloudEvents specs and lib. Providing a standard way to observe and pilot the state change seems interesting. So we already are looking for integrating CloudEvents in OKT for this purpose.

👍 agree that this state machine idea sounds quite interesting.

estroz · 2021-06-24T18:38:57Z

enhancements/orange_okt_proposition.md

+
+The architecture to render story 2 implementable is not yet fully completed and we are wondering if this approach make sense or not.
+
+The OKT library is a GO module that depends on the OperatorSDK, more specifically on the sigs and k8s.io modules aligned on those used by the OperatorSDK.


Can you clarify exactly what you mean by "depends on the OperatorSDK"? The operator-sdk repo contains a binary that scaffolds code from controller-runtime, with no public libraries; perhaps you mean OKT depends on the latter?

Yes there's no real dependancies except that OKT has no real sense without the usage of the OperatorSDK as OKT is done to be used inside an OperatorSDK's scaffolded project.
Besides that, each time the librairies used by the OperatorSDK evolves (sigs, ...), we have to align the OKT librairy as well.

estroz · 2021-06-24T18:50:07Z

@tapairmax once the OKT library is public, can you link it here? Thanks!

jmrodri

A few changes to fix some spelling errors. Overall, a great proposal.

enhancements/orange_okt_proposition.md

Co-authored-by: Jesus Rodriguez <[email protected]>

varshaprasad96

The proposal seems really interesting. It would be helpful to have the additional capabilities of OKT integrated in SDK or controller-runtime. Is there any place we can take a look at the sample Memcached Operator built using OKT (which is mentioned in proposal) ?

varshaprasad96 · 2021-08-05T22:45:53Z

enhancements/orange_okt_proposition.md

+	- **Updater** - Here OKT, thanks to a hash algorithm on resources data, will compute if after Mutation, the resource has to be Created, Update, or is unchanged against it's cluster peer instance. So for all resource in its registry it apply the same idempotent process to update (or not!!) the resource.
+	- **SuccessManager** - If all goes right (no error) , this stage is reached, and here I'll can tell to OKT to "manageSuccess()" i.e. check for the right requeuing value to return and complete this reconciliation but at same time perform the update of a status condition for the Reconciliation, transparently for me, if I passed the CR.Status when I instatiate my OKT Reconciler object. 
+	- At any previous stage, if something goes wrong, an error can be raised, there's also the case where we giveup the reconciliation for whatever reason (not an error). There is also the case where a CR Finalization is triggered. These "debranching" cases are also present in my Reconciliation steps function thanks to 2 dedicated steps: 
+		-    **CRFinalizer** - here I can perform my own tasks when OKT detected that a finalizer (with the right name) exists and the CR is being deleted.


Is this similar to the finalizer helpers present in controller-runtime - https://github.com/kubernetes-sigs/controller-runtime/tree/master/pkg/finalizer

Actually, the OKT Reconciler just check if a finalizer in CR Metadata with the same name as itself exists. If it's the case and the CR is being deleted, the stage "CRFinalizer" is called right after the stage "CRChecker" instead of continuing the normal process (ObjectsGetter, ...).
The principle is to do what you need in the CRFinalizer stage and call a provided method to delete the finalizer.
So it is just a way to manage a single finalizer on the CR.
I guess the helper you mentioned is generic as well to manage several finalizers...

varshaprasad96 · 2021-08-05T22:51:21Z

enhancements/orange_okt_proposition.md

+	- **SuccessManager** - If all goes right (no error) , this stage is reached, and here I'll can tell to OKT to "manageSuccess()" i.e. check for the right requeuing value to return and complete this reconciliation but at same time perform the update of a status condition for the Reconciliation, transparently for me, if I passed the CR.Status when I instatiate my OKT Reconciler object. 
+	- At any previous stage, if something goes wrong, an error can be raised, there's also the case where we giveup the reconciliation for whatever reason (not an error). There is also the case where a CR Finalization is triggered. These "debranching" cases are also present in my Reconciliation steps function thanks to 2 dedicated steps: 
+		-    **CRFinalizer** - here I can perform my own tasks when OKT detected that a finalizer (with the right name) exists and the CR is being deleted.
+		-    **ErrorManager** - This stage can be reached after any other stages, and let me call the OKT "manageError()" method that will pickup the last error and return the right requeuing value to complete this reconciliation.


Would it be useful to explain in detail on how the OKT Reconciler is different from the generic controller-runtime reconciler which is used in operator-sdk ? In the sense the aspects which are missing from the current implementation or the additional features which OKT Reconciler implements to fill the gaps.

Yes a good explanation by a sample would help. This the the reason I'll provide a more complete version of the Memcached operator sample. I believed Eric S. got a tarball with the first version of Memcached aligned with the "User Story 1" in this issue, feel free to go it.
Now, I'm very close to get the official GO to publish OKT on Github, I think there's no issue to drop a new Memcached operator sample under my Github space. I'll try to release it nex week.

varshaprasad96 · 2021-08-05T22:57:51Z

enhancements/orange_okt_proposition.md

+OKT comes with a statemachine feature that should help in defining these steps and let me focus on the code I need to implement at each step. 
+To allow this, OKT provides:
+
+  -  a sidecar for my application to help me to get my database status and launch actions on it asynchronously.


This seems to be really useful. But just curious, would conditions be useful here to expose the status of CR. Please correct me if this is totally tangential to what is being described here.

I'm not sure to well capture your point but I guess you're wondering if the CR status is the right place to store the conditions that represent the application's state.
I had in mind the ability to offer a simple way to follow the application state thanks to a "kubectl wait" command for example. But actually it sounds a bit unsuitable to me too, so may be that was the reason of your question :).
So it's a point still unstated today. Besides that, building a specific interface relying on the CloudEvent api could be a better option either in term of interoperability with external world and may be for others internal needs too...

tapairmax · 2021-08-09T18:47:41Z

You can ask me access to the memcached operator sample implemented with OKT.

jmccormick2001 · 2021-08-09T18:52:58Z

The memcached operator sample implemented with OKT is availaible here.
You can ask me access.

It includes a proposition related to the usage of a statemachine to manage an application life cycle here

links appear to be broken.

tapairmax · 2021-08-09T20:52:41Z

The memcached operator sample implemented with OKT is availaible here.
You can ask me access.
It includes a proposition related to the usage of a statemachine to manage an application life cycle here

links appear to be broken.
Thanks I updated the post.

tapairmax · 2021-08-11T15:39:14Z

Here we are ! We had the GO to publish in Open Source community the "Operators Karma Tools" and it is DONE.

It is published in Orange Github here.

The Memcached operator with OKT can both be tested.

As long as OTK resides on the Orange repo, it could only evolve slowly.
The aim is to continue the work already done on the Applications Life Cycle management thanks to a State Machine.

Last edit: adjust a bit the message.

Adding the issue for enhancement with OKT propostion

a0c5bd4

tapairmax changed the title ~~Adding the issue for enhancement with OKT proposition~~ OperatorSDK enhancement with best practices and OKT proposition Jun 11, 2021

estroz reviewed Jun 24, 2021

View reviewed changes

jmrodri reviewed Jun 29, 2021

View reviewed changes

tapairmax and others added 21 commits June 30, 2021 15:27

Update enhancements/orange_okt_proposition.md

79ba8e1

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

1be1265

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

573f751

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

3ec4962

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

7613554

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

da3f6a8

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

94a0c48

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

2b32a4c

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

a22ac0d

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

fef8d58

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

dc79d66

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

7218f86

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

055505d

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

b270f4e

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

dd00726

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

975b949

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

9f42327

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

2fdb82d

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

aad6ed1

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

1981452

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

32c0f14

Co-authored-by: Jesus Rodriguez <[email protected]>

tapairmax and others added 2 commits June 30, 2021 15:43

Update enhancements/orange_okt_proposition.md

8d2da32

Co-authored-by: Jesus Rodriguez <[email protected]>

Update enhancements/orange_okt_proposition.md

5e68ae6

Co-authored-by: Jesus Rodriguez <[email protected]>

varshaprasad96 reviewed Aug 5, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OperatorSDK enhancement with best practices and OKT proposition #84

OperatorSDK enhancement with best practices and OKT proposition #84

tapairmax commented Jun 11, 2021

tapairmax commented Jun 11, 2021

estroz Jun 24, 2021

tapairmax Jun 25, 2021

jmrodri Jun 29, 2021

tapairmax Jun 30, 2021 •

edited

Loading

tapairmax Jun 30, 2021

estroz Jun 24, 2021

tapairmax Jun 25, 2021

estroz Jun 24, 2021

tapairmax Jun 25, 2021 •

edited

Loading

jmrodri Jun 29, 2021

estroz Jun 24, 2021 •

edited

Loading

tapairmax Jun 25, 2021

estroz commented Jun 24, 2021

jmrodri left a comment

varshaprasad96 left a comment

varshaprasad96 Aug 5, 2021

tapairmax Aug 6, 2021

varshaprasad96 Aug 5, 2021 •

edited

Loading

tapairmax Aug 6, 2021

varshaprasad96 Aug 5, 2021

tapairmax Aug 6, 2021 •

edited

Loading

tapairmax commented Aug 9, 2021 •

edited

Loading

jmccormick2001 commented Aug 9, 2021

tapairmax commented Aug 9, 2021

tapairmax commented Aug 11, 2021 •

edited

Loading


		Today, the Story 1 is fully operational and implementable with the current version of OKT. A partial implementation detail can be provided by the MemcachedOperatorSampleWithOKT repo code.

		The architecture to render story 2 implementable is not yet fully completed and we are wondering if this approach make sense or not.


		The architecture to render story 2 implementable is not yet fully completed and we are wondering if this approach make sense or not.

		The OKT library is a GO module that depends on the OperatorSDK, more specifically on the sigs and k8s.io modules aligned on those used by the OperatorSDK.

OperatorSDK enhancement with best practices and OKT proposition #84

Are you sure you want to change the base?

OperatorSDK enhancement with best practices and OKT proposition #84

Conversation

tapairmax commented Jun 11, 2021

tapairmax commented Jun 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tapairmax Jun 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tapairmax Jun 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

estroz Jun 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

estroz commented Jun 24, 2021

jmrodri left a comment

Choose a reason for hiding this comment

varshaprasad96 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varshaprasad96 Aug 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tapairmax Aug 6, 2021 • edited Loading

Choose a reason for hiding this comment

tapairmax commented Aug 9, 2021 • edited Loading

jmccormick2001 commented Aug 9, 2021

tapairmax commented Aug 9, 2021

tapairmax commented Aug 11, 2021 • edited Loading

tapairmax Jun 30, 2021 •

edited

Loading

tapairmax Jun 25, 2021 •

edited

Loading

estroz Jun 24, 2021 •

edited

Loading

varshaprasad96 Aug 5, 2021 •

edited

Loading

tapairmax Aug 6, 2021 •

edited

Loading

tapairmax commented Aug 9, 2021 •

edited

Loading

tapairmax commented Aug 11, 2021 •

edited

Loading